An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis
نویسندگان
چکیده
We present a newly collected data set of 8,868 gold-standard annotated Arabic twitter feeds. The corpus is manually labelled for subjectivity and sentiment analysis (SSA) (κ = 0.816). In addition, the corpus is annotated with a variety of linguistically motivated feature-sets that have previously shown positive impact on classification performance. The paper highlights issues posed by twitter as a genre, such as a mixture of language varieties and topic-shifts. Our next step is to extend the current corpus, using online semi-supervised learning. A first sub-corpus will be released via the ELRA repository as part of this submission.
منابع مشابه
MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs
In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...
متن کاملEvaluating Distant Supervision for Subjectivity and Sentiment Analysis on Arabic Twitter Feeds
Supervised machine learning methods for automatic subjectivity and sentiment analysis (SSA) are problematic when applied to social media, such as Twitter, since they do not generalise well to unseen topics. A possible remedy of this problem is to apply distant supervision (DS) approaches, which learn from large amounts of automatically annotated data. This research empirically evaluates the per...
متن کاملSubjectivity and Sentiment Analysis of Modern Standard Arabic and Arabic Microblogs
Though much research has been conducted on Subjectivity and Sentiment Analysis (SSA) during the last decade, little work has focused on Arabic. In this work, we focus on SSA for both Modern Standard Arabic (MSA) news articles and dialectal Arabic microblogs from Twitter. We showcase some of the challenges associated with SSA on microblogs. We adopted a random graph walk approach to extend the A...
متن کاملSubjectivity and Sentiment Annotation of Modern Standard Arabic Newswire
Subjectivity and sentiment analysis (SSA) is an area that has been witnessing a flurry of novel research. However, only few attempts have been made to build SSA systems for morphologically-rich languages (MRL). In the current study, we report efforts to partially bridge this gap. We present a newly labeled corpus of Modern Standard Arabic (MSA) from the news domain manually annotated for subjec...
متن کاملSaudi Twitter Corpus for Sentiment Analysis
Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014